Optimality Properties of a Proposed Precursor to the Genetic Code 
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We calculate the optimality of a doublet precursor to the canonical genetic code with respect 
to mitigating the effects of point mutations and compare our results to corresponding ones for the 
canonical genetic code. We find that the proposed precursor has much less optimality than that 
of the canonical code. Our results render unlikely the notion that the doublet precursor was an 
intermediate state in the evolution of the canonical genetic code. These findings support the notion 
that code optimality refiects evolutionary dynamics, and that if such a doublet code originally had 
a biochemical significance, it arose before the emergence of translation. 

PACS numbers: 82.39.Pj, 87.23. Kg, 87.10.Rt 



It is now well-established that the canonical genetic 
code is not a frozen accident, but exhibits a pattern of 
amino acid-codon correspondences that has the effect of 
making the code insensitive to certain classes of point 
mutation or translation error [l|, [3, H, S H, H- ^ 
variety of schemes [8|, including ones invoking evolution- 
ary dynamics and stereochemistry [l^ [llj, have been 
put forward to explain this pattern and others |l2l| in 



the genetic code (for recent reviews, see [13j, |lj|). It 
is important to stress that the optimality of the code is 
most manifest with respect to only one particular class of 
amino acid attributes, related to the free amino acid polar 
requirement 15, and this suggests the code is a very 
ancient part of the cell's machinery, functioning either 
in its present role of translation, or in some earlier un- 
known function. Additionally, it has been shown recently 
that the genetic code has extreme error-minimizing op- 
timality, being more optimal than all but one or two 
random codes generated in sets of ten million Q. This 
result lends strong support to the suggestion that the 
code's evolutionary dynamics was dominated by collec- 
tive mechanisms arising from horizontal gene transfer 
Computational evidence shows that core chemical affini- 
ties in the genetic code are fully compatible with, and 
independent from, evolutionary dynamics that lead to 
error minimizing optimality'17], suggesting that error- 
minimizing optimality is not a by-product of chemistry 
but arises from the evolutionary dynamics. 

In this report, we attempt to ascertain to what extent, 
if any, error-minimizing optimality can be used to con- 
strain a proposed scenario for the evolution of the genetic 
code. If the optimality with respect to polar requirement 
was a feature of the code from very early times, then pre- 
cursor code proposals must respect error-minimizing op- 
timality to a significant degree. Alternatively, proposed 
precursor codes may claim to date prior to any code evo- 
lution, and to be the product of other factors alone. Such 
precursors would not be expected to display a significant 
level of error-minimizing optimality, assuming that it is 
indeed the case that optimality is primarily a reflection 



of evolutionary dynamics. Here we show that a specific 
biochemically-motivated precursor code does not show 
evidence for significant error-minimizing optimality, even 
though it is a projection of the canonical code; these re- 
sults support the notion that error-minimizing optimality 
primarily refiects evolutionary dynamics, and imply that 
this type of precursor code, if it ever existed, would have 
arisen prior to the emergence of translation. 

Copley, Smith and Morowitz have suggested that first 
and, to a lesser extent, second base assignments in the 
canonical code would arise if the code has its origin in 
amino acid synthesis channels embedded in dinucleotide 
complexes prior to the emergence of translation 18]. The 
proposal exploits the strong constraints such a theory im- 
poses on the first two bases of the genetic code to generate 
a specific precursor doublet code based on a projection of 
the canonical genetic code to a doublet code. For most of 
the projection, the third codon is sufficiently redundant 
that the first two bases are sufficient to define the amino 
acid coded for by doublet. In the event that the third 
bases associated with a doublet codon code for multiple 
amino acids, the proposal favors the simpler of the amino 
acids (table HI . They further refine the proposal by in- 
corporating possible precursor amino acids motivated by 
their study of the biosynthetic pathways for amino acids 
(not shown) [ll |. 

To further assess and characterize the proposed pre- 
cursor code in [l^ we analyze the degree to which it 
contains error-minimizing optimality. As noted above, 
the proposed precursor code is based primarily on argu- 
ments about biosynthetic pathways rather than evolu- 
tionary considerations. Additionally, it explicitly dates 
to prior to translation All mechanisms of which we 
are aware for code evolvability explicitly require transla- 
tion machinery (see for example ^, 3, IS, 21, 23, 23] )■ 
Thus we anticipate that the proposed precursor code 
should contain little, if any, evidence for optimality. 

We have analyzed the former of these proposed pre- 
cursor doublet codes (see table) for error-minimizing 
optimality using the "experimental polar requirement" 
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TABLE L Proposed precursor code from Ref. 18j. Row is first 
base, column is second base. 



(EPR) [i, [H, [il, [H derived originally by Woese and 
co-workers. We have also analyzed the precursor using a 
modern computational update of the polar requirement 
(CPR) [2^. Analysis with the CPR is of particular in- 
terest, because it is the measure of amino acid difference 
that when applied in code optimality analysis algorithms 
to the canonical genetic code gives rise to the extreme 
optimality cited above 0]. Thus the CPR can be con- 
sidered to capture some essential aspect of amino acid 
chemistry of particular relevance during the evolution of 
the genetic code. Analysis of the more refined version of 
the proposed precursor code is difficult due to the fact 
that the polar requirements for the proposed precursor 
amino acids are unknown. We believe, however, that the 
qualitative results apply to both versions of the proposed 
precursor since large changes in the chemical properties 
as amino acids are refined at the same position are un- 
likely. 

To analyze the error-minimizing optimality in the pro- 
posed precursor code, we used the point mutation code 
analysis algorithm described in [3| and The presen- 
tation of this algorithm in 0] considers an ensemble of 
random genetic codes genetic code as mappings from the 
set of codons (minus the termination codons) to the set 
of amino acids, GC* : Codons — » Amino Acids, where 
i indexes a particular set of assignments of codons to 
amino acids, with GC^ as the canonical code. Versions 
GC^^^ are generated by randomly permuting amino acid 
labels, again excluding termination codons. can then 
be calculated as 



Ori^ ^ {GC\c)^GG^{c')f 

{c.,c')^Ter 



(1) 



where (c, c') ^ Ter denotes a sum over nearest neighbor 
codons with the nearest neighbors of a codon defined by 
its single point mutations, with all mutations to or from 
a termination codon excluded. 

To allow simple comparison of the results that does not 
depend on rescaling of amino acid properties, we com- 
pute the probability Pb — Pr{0 > Oi) that a random 
realization is more optimal than the canonical code by 
calculating the percentage of our random codes that are 
more optimal than the canonical code. 



The error in the computed Pi, can be estimated us- 
ing an analytical realization of bootstrap resampling de- 
rived from an exact correspondence with the statistics of 
the asymmetric one dimensional random walk This 
correspondence shows that if N codes are sampled, and 
No>Oi are more optimal than the code being tested, then 
Pfc with standard error is given by the expression 



(2) 



While this is in line with naive expectations for the 
form of error, the problem of sampling more optimal ran- 
dom codes is a problem of rare event sampling, which is 
frequently unstable and prone to nonstandard large er- 
rors. This makes a rigorous derivation of the exact error a 
key result essential for robust interpretation of optimality 
calculation results. The form of the error also informs the 
computations. It is clear from Eq. [5] that the relevant 
sample size for a statistically sound analysis is not N , 
but the number of more optimal codes sampled, No>Oi 
Q. A reasonable minimum is, perhaps, 20 more optimal 
codes sampled to get a statistical estimate. Much larger 
samples would be preferable, but in many applications 
may be hard to obtain due to computational limitations 
encountered when analyzing highly optimized codes. 

When applied to the proposed precursor code, we es- 
timated Pb = (1.44 ± 0.038) X 10-2 with the experimen- 
tal polar requirement, or Pb = (7.95 ± 0.282) x lO^^ 
with the computational polar requirement (25j . To com- 
pare, we applied this simplified code analysis algorithm 
to the canonical genetic code. The canonical genetic 
code has optimality of Pb = (1.18 ± 0.109) x lO"'' or 
Pb = (4.7±0.686) x 10"^ with the EPR and CPR respec- 
tively (the extreme optimality discussed above included 
transition and transversion biases for each base position 
in the calculation [1,01)- Thus the optimality of the pre- 
cursor is, with either the EPR and the CPR, two orders 
of magnitude less optimal than the canonical genetic code 
evaluated with the equivalent algorithm. 

As discussed above, the derivation of the doublet code 
in table U depended on projecting the third base onto the 
doublets by favoring the simplest amino acid coded for by 
the triplet codons associated with a given doublet. We re- 
peated the optimality analysis for versions of the doublet 
code that favored more complex amino acids at individ- 
ual doublets (such as substituting Arg for Ser at the AG 
position). None of the modified doublet codes displayed 
a significant increase in optimality over the version in ta- 
ble [D Thus a more optimal version of the precursor code, 
which respects the underlying biosynthesis theory, would 
differ in several positions from the proposal by Copley et 
al.[l3. 

Our results show that the proposed precursor code 
has weak error-minimizing optimality with respect to the 
polar requirement, compared to the canonical genetic 
code. This result is surprising in one respect, because 



3 



the doublet code is a projection of the canonical code. 
A number of possible interpretations are possible. (1) 
The doublet precursor code is not an intermediate evo- 
lutionary stage from some earlier precursor code; this 
is consistent with the basis for the original proposal of 
this code as a biosynthetic pathway, but is puzzling be- 
cause the later canonical triplet code is optimized with 
respect to the free amino acid polar requirement. (2) 
The precursor has no biological significance at all, and 
did not evolve from an earlier precursor, which exhibits 
free amino acid polar requirement optimality. (3) The 
precursor doublet code predates evolution for error min- 
imization, and if the amino acid synthesis scheme is cor- 
rect, then modifications to the doublet code during its 
evolution to today's canonical code are responsible for 
its observed error-minimizing optimality. The relatively 
large Pb value (i.e. small amount of observed optimal- 
ity) in the precursor is an artifact of deriving the doublet 
code from the highly evolved canonical code. 

Our analysis does not address the question of whether 
or not the detailed biochemical theory proposed is cor- 
rect, because presumably optimal precursor codes that 
are consistent with both the biochemical theory and un- 
corrupted by evolution could be constructed. 

We gratefully acknowledge discussions with Carl 
Woese, Rob Knight, Shelley Copley, Eric Smith and 
Harold Morowitz. This material is based on work sup- 
ported by the National Science Foundation under Grant 
No. NSF-EF-0526747. 
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